SEPA: Approximate Non-subjective Empirical p-Value Estimation for Nucleotide Sequence Alignment

نویسندگان

  • Ofer H. Gill
  • Bud Mishra
چکیده

In the bioinformatics literature, pairwise sequence alignment methods appear with many variations and diverse applications. With this abundance, comes not only an emphasis on speed and memory efficiency, but also a need for assigning confidence to the computed alignments through p-value estimation, especially for important segment pairs within an alignment. This paper examines an empirical technique, called SEPA, for approximate p-value estimation based on statistically large number of observations over randomly generated sequences. Our empirical studies show that the technique remains effective in identifying biological correlations even in sequences of low similarities and large expected gaps, and the experimental results shown here point to many interesting insights and features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MACRO-PERFECTOS-APE — MAtrix CompaRisOn & PrEdicting Regulatory Functional Effect of SNPs by Approximate P-value Estimation

Here we present MACRO-APE and PERFECTOS-APE software designed for practical sequence analysis involving classic mononucleotide and dinucleotide position weight matrices (PWMs) of DNA sequence patterns often called motifs. The common usage case for DNA motifs is representation of transcription factor binding sites. The software allows (1) comparing different PWMs using a variant of Jaccard simil...

متن کامل

Identification and characterization of a NBS–LRR class resistance gene analog in Pistacia atlantica subsp. Kurdica

P. atlantica subsp. Kurdica, with the local name of Baneh, is a wild medicinal plant which grows in Kurdistan, Iran.  The identification of resistance gene analogs holds great promise for the development of resistant cultivars. A PCR approach with degenerate primers designed according to conserved NBS-LRR (nucleotide binding site-leucine rich repeat) regions of known disease-resistance (R) gene...

متن کامل

Correcting the Bias of Empirical Frequency Parameter Estimators in Codon Models

Markov models of codon substitution are powerful inferential tools for studying biological processes such as natural selection and preferences in amino acid substitution. The equilibrium character distributions of these models are almost always estimated using nucleotide frequencies observed in a sequence alignment, primarily as a matter of historical convention. In this note, we demonstrate th...

متن کامل

Enhancing Parallelism of Pairwise Statistical Significance Estimation for Local Sequence Alignment

Pairwise statistical significance (PSS) has been found to be able to accurately identify related sequences (homology detection), which is a fundamental step in numerous applications relating to sequence analysis. Although more accurate than database statistical significance, it is both computationally intensive and data intensive to construct the empirical score distribution during the estimati...

متن کامل

Empirical Bayes Estimation in Nonstationary Markov chains

Estimation procedures for nonstationary Markov chains appear to be relatively sparse. This work introduces empirical  Bayes estimators  for the transition probability  matrix of a finite nonstationary  Markov chain. The data are assumed to be of  a panel study type in which each data set consists of a sequence of observations on N>=2 independent and identically dis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006